Goto

Collaborating Authors

 ultimate optimizer


Gradient Descent: The Ultimate Optimizer

Neural Information Processing Systems

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model parameters by manually deriving expressions for hypergradients ahead of time.We show how to compute hypergradients with a simple and elegant modification to backpropagation. This allows us to easily apply the method to other optimizers and hyperparameters (e.g.


Gradient Descent: The Ultimate Optimizer

Neural Information Processing Systems

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model parameters by manually deriving expressions for "hypergradients" ahead of time.We show how to automatically compute hypergradients with a simple and elegant modification to backpropagation. This allows us to easily apply the method to other optimizers and hyperparameters (e.g. We can even recursively apply the method to its own hyper-hyperparameters, and so on ad infinitum. As these towers of optimizers grow taller, they become less sensitive to the initial choice of hyperparameters.


SCoTTi: Save Computation at Training Time with an adaptive framework

arXiv.org Artificial Intelligence

On-device training is an emerging approach in machine learning where models are trained on edge devices, aiming to enhance privacy protection and real-time performance. However, edge devices typically possess restricted computational power and resources, making it challenging to perform computationally intensive model training tasks. Consequently, reducing resource consumption during training has become a pressing concern in this field. To this end, we propose SCoTTi (Save Computation at Training Time), an adaptive framework that addresses the aforementioned challenge. It leverages an optimizable threshold parameter to effectively reduce the number of neuron updates during training which corresponds to a decrease in memory and computation footprint. Our proposed approach demonstrates superior performance compared to the state-of-the-art methods regarding computational resource savings on various commonly employed benchmarks and popular architectures, including ResNets, MobileNet, and Swin-T.


#NeurIPS2022 outstanding paper – Gradient descent: the ultimate optimizer

AIHub

Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley and Erik Meijer won a NeurIPS 2022 outstanding paper award for their work Gradient descent: the ultimate optimizer. Here, they tell us more about their work, the methodology and their main findings. Our paper studies the classic problem of "hyperparameter optimization". Nearly all of today's machine learning algorithms use a process called "stochastic gradient descent" (SGD) to train neural networks. SGD requires users to pick certain settings, or "hyperparameters," before running it.


Gradient Descent: The Ultimate Optimizer

arXiv.org Machine Learning

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values.